首页> 外文OA文献 >CHAOS : A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi
【2h】

CHAOS : A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

机译:CHAOS:在英特尔至强融核上训练卷积神经网络的并行化方案

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Deep learning is an important component of big-data analytic tools and intelligent applications, such as, self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive, and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks.In this paper, we present our parallelization scheme for training convolutional neural networks (CNN) named Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS). Major features of CHAOS include the support for thread and vector parallelism, non-instant updates of weight parameters during back-propagation without a significant delay, and implicit synchronization in arbitrary order. CHAOS is tailored for parallel computing systems that are accelerated with the Intel Xeon Phi. We evaluate our parallelization approach empirically using measurement techniques and performance modeling for various numbers of threads and CNN architectures. Experimental results for the MNIST dataset of handwritten digits using the total number of threads on the Xeon Phi show speedups of up to 103x compared to the execution on one thread of the Xeon Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x compared to the sequential execution on Intel Core i5.
机译:深度学习是大数据分析工具和智能应用程序(例如自动驾驶汽车,计算机视觉,语音识别或精密医学)的重要组成部分。然而,训练过程是计算密集型的,并且如果顺序执行则经常需要大量时间。现代并行计算系统提供了减少所需深度神经网络训练时间的能力。在本文中,我们提出了一种用于训练卷积神经网络(CNN)的并行化方案,该方案称为具有任意同步阶数(CHAOS)的受控Hogwild。 CHAOS的主要功能包括对线程和向量并行性的支持,反向传播期间权重参数的非即时更新而没有明显的延迟以及任意顺序的隐式同步。 CHAOS专为使用Intel Xeon Phi加速的并行计算系统而设计。我们针对各种数量的线程和CNN架构,使用测量技术和性能建模经验性地评估了我们的并行化方法。使用Xeon Phi上线程总数的MNIST手写数字数据集的实验结果显示,与在Xeon Phi的一个线程上执行相比,速度提高了103倍,在Intel Xeon E5上的顺序执行上提高了14倍,而58x与Intel Core i5上的顺序执行相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号